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ABSTRACT 

Word clouds are a popular tool for visualizing documents, 
but they are not a good tool for comparing documents, be- 
cause identical words are not presented consistently across 
different clouds. We introduce the concept of word storms, 
a visualization tool for analysing corpora of documents. A 
word storm is a group of word clouds, in which each cloud 
represents a single document, juxtaposed to allow the viewer 
to compare and contrast the documents. We present a novel 
algorithm that creates a coordinated word storm, in which 
words that appear in multiple documents are placed in the 
same location, using the same color and orientation, in all 
of the corresponding clouds. In this way, similar documents 
are represented by similar- looking word clouds, making them 
easier to compare and contrast visually. We evaluate the al- 
gorithm in two ways: first, an automatic evaluation based 
on document classification; and second, a user study. The 
results confirm that unlike standard word clouds, a coor- 
dinated word storm better allows for visual comparison of 
documents. 

Categories and Subject Descriptors 

H. 5 [Information Search and Retrieval]: Information 
Interfaces and Presentation 

I. INTRODUCTION 

Because of the vast number of text documents on the Web, 
there is a demand for ways to allow people to scan large 
numbers of documents quickly. A natural approach is vi- 
sualization, under the hope that visually scanning a picture 
may be easier for people than reading text. One of the most 
popular visualization methods for text documents are word 
clouds. A word cloud is a graphical presentation of a doc- 
ument, usually generated by plotting the document's most 
common words in two dimensional space, with the word's 
frequency indicated by its font size. Word clouds have the 
advantages that they are easy for naive users to interpret 
and that they can be aesthetically surprising and pleasing. 
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One of the most popular cloud generators, Wordle, has gen- 
erated over 1.4 million clouds that have been publicly posted 

Despite their popularity for visualizing single documents, 
word clouds are not useful for navigating groups of docu- 
ments, such as blogs or Web sites. The key problem is that 
word clouds are difficult to compare visually. For example, 
say that we want to compare two documents, so we build a 
word cloud separately for each document. Even if the two 
documents are topically similar, the resulting clouds can be 
very different visually, because the shared words between 
the documents are usually scrambled, appearing in different 
locations in each of the two clouds. The effect is that it is 
difficult to determine which words are shared between the 
documents. 

In this paper, we introduce the concept of word storms 
to afford visual comparison of groups of documents. Just 
as a storm is a group of clouds, a word storm is a group of 
word clouds. Each cloud in the storm represents a subset of 
the corpus. For example, a storm might contain one cloud 
per document, or alternatively one cloud to represent all the 
documents written in each year, or one cloud to represent 
each track of an academic conference, etc. Effective storms 
make it easy to compare and contrast documents visually. 
We propose several principles behind effective storms, the 
most important of which is that similar documents should 
be represented by visually similar clouds. To achieve this, 
algorithms for generating storms must perform layout of the 
clouds in a coordinated manner. 

We present a novel algorithm for generating coordinated 
word storms. Its goal is to generate a set of visually ap- 
pealing clouds, under the constraint that if the same word 
appears in more than one cloud in the storm, it appears in 
a similar location. Interestingly, this also allows a user to 
see when a word is not in a cloud: simply find the desired 
word in one cloud and check the corresponding locations 
in all the other clouds. At a technical level, our algorithm 
combines the greedy randomized layout strategy of Wor- 
dle, which generates aesthetically pleasing layouts, with an 
optimization-based approach to maintain coordination be- 
tween the clouds. The objective function in the optimiza- 
tion measures the amount of coordination in the storm and 
is inspired by the theory of multidimensional scaling. 

We apply this algorithm on a variety of text corpora, in- 
cluding academic papers and research grant proposals. We 
evaluate the algorithm in two ways. First, we present a novel 
automatic evaluation method for word storms based on how 
well the clouds, represented as vectors of pixels, serve as 



features for document classification. The automatic evalua- 
tion allows us to rapidly compare different layout algorithms. 
However, this evaluation is not specific to word clouds and 
may be of independent interest. Second, we present a user 
study in which users are asked to examine and compare the 
clouds in storm. Both experiments demonstrate that a coor- 
dinated word storm is dramatically better than independent 
word clouds at allowing users to visually compare and con- 
trast documents. 

2. DESIGN PRINCIPLES 

In this section we introduce the concept of a word storm, 
describe different types of storms, and present design prin- 
ciples for effective storms. 

A word storm is a group of word clouds constructed for the 
purpose of visualizing a corpus of documents. In the sim- 
plest type of storm, each cloud represents a single document 
by creating a summary of its content; hence, by looking at 
the clouds a user can form a quick impression of the cor- 
pus's content and analyse the relations among the different 
documents. 

We build on word clouds in our work because they are 
a popular way of visualising single documents. They are 
very easy to understand and they have been widely used to 
create appealing figures. By building a storm based on word 
clouds, we create an accessible tool that can be understood 
easily and used without requiring a background in statistics. 
The aim of a word storm is to extend the capabilities of a 
word cloud: instead of visualizing just one document, it is 
used to visualize an entire corpus. 

There are two high level design motivations behind the 
concept of word storms. The first design motivation is to 
visualize high- dimensional data in a high- dimensional space. 
Many classical visualization techniques are based on dimen- 
sionality reduction, i.e., mapping high-dimensional data into 
a low dimensional space. Word storms take an alternative 
strategy, of mapping high dimensional data into a different 
high dimensional space, but one which is tailored for human 
visual processing. This a similar strategy to approaches like 
Chernoff faces 2 . The second design motivation is the prin- 
ciple of small multiples [12J[TT], in which similar visualiza- 
tions are presented together in a table so that the eye is 
drawn to the similarities and differences between them. A 
word storm is a small multiple of word clouds. This moti- 
vation strongly influences the design of effective clouds, as 
described in Section [231 

2.1 Types of Storms 

Different types of storms can be constructed for different 
data analysis tasks. In general, the individual clouds in a 
storm can represent a group of documents rather than a 
single document. For example, a cloud could represent all 
the documents written in a particular month, or that appear 
on a particular section of a web site. It would be typical to do 
this by simply merging all of the documents in each group, 
and then generating the storm with one cloud per merged 
document. This makes the storm a flexible tool that can be 
used for different types of analysis, and it is possible to create 
different storms from the same corpus and obtain different 
insights on it. Here are some example scenarios: 

1. Comparing Individual Documents. If the goal is to 
compare and contrast individual documents in a corpus, 



then we can build in a storm in which each word cloud 
represents a single document. 

2. Temporal Evolution of Documents. If we have a 
set of documents that have been written over a long pe- 
riod, such as news articles, blog posts, or scientific docu- 
ments, we may want to analyze how trends in the docu- 
ments have changed over time. This is achieved using a 
word storm in which each cloud represents a time period, 
e.g., one cloud per week or per month. By looking at 
the clouds sequentially, the user can see the appearance 
and disappearance of words and how their importance 
changes over time. 

3. Hierarchies of Documents. If the corpus is arranged 
in a hierarchy of categories, we can create a storm which 
contains one cloud for each of the categories and subcat- 
egories. This allows for hierarchical interaction, in which 
for every category of the topic hierarchy, we have a storm 
that contains one cloud for each subcategory. For in- 
stance, this structure can be useful in a corpus of scientific 
papers. At the top level, we would first have a storm that 
contains one cloud for each scientific field (e.g., chemistry, 
physics, engineering), then for each field, we also have a 
separate storm that includes one cloud for each subfield 
(such as organic chemistry, inorganic chemistry) and so 
on until arriving at the articles. An example of this type 
of storm is shown in Figures [2] and [3] 

To keep the explanations simple, when describing the algo- 
rithms later on, we will assume that each cloud in the storm 
represents a single document, with the understanding that 
the "document" in this context may have been created by 
concatenating a group of documents, as in the storms of 
type 2 and 3 above. 

2.2 Levels of Analysis of Storms 

A single word storm allows the user to analyse the corpus 
at a variety of different levels, depending on what type of 
information is of most interest, such as: 

1. Overall Impression. By scanning the largest terms 
across all the clouds, the user can form a quick impression 
of the topics of whole corpus. 

2. Comparison of Documents. As the storm displays 
the clouds together, the user can easily compare them and 
look for similarities and differences among the clouds. For 
example, the user can look for words that are much more 
common in one document than another. Also the user can 
compare whether two clouds have similar shape, to gauge 
the overall similarity of the corresponding documents. 

3. Analysis of Single Documents. Finally, the clouds 
in the storm have meaning in themselves. Just as with 
a single word cloud, the user can analyze an individual 
cloud to get an impression of a single document. 

2.3 Principles of Effective Word Storms 

Because they support additional types of analysis, princi- 
ples for effective word storms are different than those for in- 
dividual clouds. This section describes some desirable prop- 
erties of effective word storms. 

First of all, each cloud should be a good representation 
of its document. That is, each cloud ought to emphasize 
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Figure 1: We represented the papers of the ICML 2012 conference. These 8 clouds represent the papers in the Opitmization 
Algorithms track. 



the most important words so that the information that it 
transmits is faithful to its content. Each cloud in a storm 
should be an effective visualization in its own right. 

Further principles follow from the fact that the clouds 
should also be built taking into account the roles they will 
play in the complete storm. In particular, clouds should be 
designed so that they are effective as small multiples [TTHT2] , 
that is, they should be easy to compare and contrast. This 
has several implications. First, clouds should be similar so 
that they look like multiples of the same thing, making the 
storm a whole unit. Because the same structure is main- 
tained across the different clouds, they are easier to compare, 
so that the viewer's attention is focused on the differences 
among them. A related implication is that the clouds ought 
to be small enough that viewers can analyze multiple clouds 
at the same time without undue effort. 

The way the clouds are arranged and organised on the can- 
vas can also play an important role, because clouds are prob- 
ably more easily compared to their neighbours than to the 
more distant clouds. This suggests a principle that clouds in 
a storm should be arranged to facilitate the most important 
comparisons. In the current paper, we take a simple ap- 
proach to this issue, simply arranging the clouds in a grid, 
but in future work it might be a good option to place sim- 
ilar clouds closer together so that they can be more easily 
compared. 

A final, and perhaps the most important, principle is one 
that we will call the coordination of similarity principle. In 
an effective storm, visual comparisons between clouds should 
reflect the underlying relationships between documents, so 
that similar documents should have similar clouds, and dis- 
similar documents should have visually distinct clouds. This 
principle has particularly strong implications. For instance, 
to follow this principle, words should appear in a similar 
font and similar colours when they appear in multiple clouds. 
More ambitiously, words should also have approximately the 
same position when the same position across all clouds. 



Following the coordination of similarity principle can sig- 
nificantly enhance the usefulness of the storm. For exam- 
ple, a common operation when comparing word clouds is to 
finding and comparing words between the clouds, e.g., once 
a word is spotted in a cloud, checking if it also appears in 
other clouds. By displaying shared words in the same color 
and position across clouds, it is much easier for a viewer to 
determine which words are shared across clouds, and which 
words appear in one cloud but not in another. Furthermore, 
making common words look the same tends to cause the 
overall shape of the clouds of similar documents to appear 
visually similar, allowing the viewer to assess the degree of 
similarity of two documents without needing to fully scan 
the clouds. 

Following these principles presents a challenge for algo- 
rithms that build word storms. Existing algorithms for build- 
ing single word clouds do not take into account relationships 
between multiple clouds in a storm. In the next sections we 
propose new algorithms for building effective storms. 

3. CREATING A SINGLE CLOUD 

In this section, we describe the layout algorithm for sin- 
gle clouds that we will extend when we present our new 
algorithm for word storms. The method is based closely on 
that of Wordle 6 , because it tends to produce aesthetically 
pleasing clouds. Formally, we define a word cloud as a set 
of words W — {wi, . . . . w m } , where each word w £ W is 
assigned a position p w = (x w , y w ) and visual attributes that 
include its font size Swi 

color c w and orientation o w (hori- 
zontal or vertical). 

To select the words in a cloud, we choose the top M words 
from the document by term frequency, after removing stop 
words. A more general measure of the weight of each term, 
such as tf*idf could be used instead; for this reason we use 
term weight to refer to whatever measure we have selected 
for the importance of each term. The font size is set propor- 
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Figure 2: These clouds represent 6 EPSRC Scientific Programmes. Each of the programmes is obtained by concatenating all 
its grants abstracts. 
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Figure 3: A word storm containing six randomly sampled grants from the Complexity Programme (which was Cloud (e) in 
Figure The word "complex", that only appeared in one cloud in Figure [5] appears in all clouds in this Figure. As this 
word conveys more information in Figure [2] than in here, here it is colored more transparent. 



tional to the term's frequency, and the color and orientation 
are selected randomly. 

Choosing the word positions is more complex, because the 
words must not overlap on the canvas. We use the layout 
algorithm from Wordle [6 , which we will refer to as the 
Spiral Algorithm. 

This algorithm is greedy and incremental; it sets the lo- 
cation of one word at a time in order of weight. In other 
words, at the beginning of the i-th step, the algorithm has 
generated a partial word cloud containing the i — 1 words 
of largest weight. To add a word w to the cloud, the algo- 
rithm places it at an initial desired position p w (e.g., chosen 
randomly). If at that position, w does not intersect any pre- 
vious words and is entirely within the frame, we go on to 
the next word. Otherwise, w is moved one step outwards 
along a spiral path. The algorithm keeps moving the word 
over the spiral until it finds a valid position, that is, it does 
not overlap and it is inside the frame. Then, it moves on to 
the next word. This algorithm is shown in Algorithm [I] 

We set the word desired positions randomly by sampling 
a 2D Gaussian distribution whose mean is at the center of 
the word cloud frame. The variance is adjusted depending 
on the width and the height of the word cloud frame. If the 
desired position is sampled outside the frame or it intersects 
with it, it is resampled until it is inside. 

Note that the algorithm assumes that the size of the frame 
is given. To choose the size of the frame, we estimate the 
necessary width and the height to fit M words. This choice 
will affect the compactness of the resulting word cloud. If the 
frame is too big, the words will find valid locations quickly 
but the resulting cloud will contain a lot of white space. If it 
is frame is too small, it will be more difficult or impossible to 
fit all the words. A maximum number of iterations is set to 
prevent words from looping forever. If one word reaches the 
maximum number of iterations, we assume that the word 
cannot fit in the current configuration. In that case, the 
algorithm is restarted with a larger frame. 



Algorithm 1 Spiral Algorithm 

Require: Words W, optionally positions p = {p w }wew 
Ensure: Final positions p = {p w }wew 
1: for all words w £ {w±, . . . , wm} do 

2: if initial position p w unsupplied, sample from Gaussian 
3: count ^— 

4: while p w not valid A count < Max Iteration do 

5: Move p w one step along a Spiral path 

6: count <(— count + 1 

7: end while 

8: if p w not valid then 

9: Restart with a larger Frame 

10: end if 
11: end for 



In order to decide if two words intersect, we check them 
at the glyph level, instead of only considering a bounding 
box around the word. This ensures a more compact re- 
sult. However, checking the intersection of two glyphs can 
be expensive, so instead we use a tree of rectangular bound- 
ing boxes that closely follows the shape of the glyph, as in 
[6] . We use the implementation of the approach in the open 
source library WordCramQ 

1 http : / / wordcram . org 



4. CREATING A STORM 

In this section, we present novel algorithms to build a 
storm. The simplest method would of course be to simply 
run the single-cloud algorithm of Section[3]independently for 
each document, but the resulting storms would typicall y vi- 
olate the principle of coordination of similarity (Section |2.3| ) 
because words will tend to have different colors, orientations, 
and layouts even when they are shared between documents. 
Instead, our algorithms will coordinate the layout of differ- 
ent clouds, so that when words appear in more than one 
cloud, they have the same color, orientation, and position. 
In this way, if the viewer finds a word in one of the clouds, 
it is easy to check if it appears in any other clouds. 

We represent each document as a vector m, where m w is 
the count of word w in document i. A word cloud v% is a 
tuple Vi = (Wi, {piw}, {ciw}, {siw}), where Wi is the set of 
words that are to be displayed in cloud i, and for any word 
w G Wi, we define pi W — (x the position of w in 

the cloud the color, and Si W the font size. We write 

Pi = {piw I w e Wi} for the set of all word locations in vi. 

Our algorithms will focus on coordinating word locations 
and attributes of words that are shared in multiple clouds in 
a storm. However, it is also possible to select the words that 
are displayed in each cloud in a coordinated way that con- 
siders the entire corpus. For example, instead of selecting 
words by their frequency in the current document, we could 
use global measures, such as tf * idf, that could emphasize 
the differences among clouds. We tried a few preliminary 
experiments with this but subjectively preferred storms pro- 
duced using tf. 

4.1 Coordinated Attribute Selection 

A simple way to improve the coordination of the clouds 
in a storm is to ensure that words that appear in more than 
one clouds are displayed with the same color and orientation 
across clouds. We can go a bit farther than this, however, 
by encoding information in the words' color and orienta- 
tion. In our case, we decided to use color as an additional 
way of encoding the relevance of a term in the document. 
Rather than encoding this information in the hue, which 
would required a model of color saliency, instead we control 
the color transparency. We choose the alpha channel of the 
color to correspond to the inverse document frequency idf 
of the word in the corpus. In this way, words that appear in 
a small number of documents will have opaque colors, while 
words that occur in many documents will be more trans- 
parent. In this way the color choice emphasizes differences 
among the documents, by making more informative words 
more noticeable. 

4.2 Coordinated Layout: Iterative Algorithm 

Coordinating the positions of shared words is much more 
difficult than coordinating the visual attributes. In this sec- 
tion we present the first of three algorithms for coordina- 
tion word positions. In the same manner that we have set 
the color and the orientation, we want to set the position 
Pwi = Pwj Vvi,Vj G Vw, where V w is the set of clouds that 
contain word w. The task is more challenging because it 
adds an additional constraint to the layout algorithm. In- 
stead of only avoiding overlaps, now we have the constraint 
of placing the words in the same position across the clouds. 
In order to do so, we present a layout algorithm that itera- 
tively generates valid word clouds changing the location of 



Algorithm 2 Iterative Layout Algorithm 



is: 



Require: Storm v% — (Wi, {ci W }, {si W }) without positions 
Ensure: Word storm {v±, . . . , vn} with positions 

1: for i e {1,...,A} do 

2: pi <- Spiral Algorithm(Wi) 

3: end for 

4: while Not Converged A count < Max Iteration do 
5: for i e {1,..., N} do 

6: ^ J2 Vj ev w Pj™ > Vw E Wi 

7: Pi <- SPIRALALGORITHM(T4 / i, p-) 

8: end for 

9: count = count + 1 
10: end while 



the shared words to make them converge to the same po- 
sition in all clouds. We will refer to this procedure as the 
iterative layout algorithm, which is shown in Algorithm [2] 

In particular, the iterative layout algorithm works by re- 
peatedly calling the spiral algorithm (Section [3]) with differ- 
ent desired locations for the shared words. At the first itera- 
tion, the desired locations are set randomly, in the same way 
we did for a single cloud. Subsequently, the new desired lo- 
cations are chosen by averaging the previous final locations 
of the word in the different clouds. That is, the new desired 
location for word w is p' w — \Vw\~ 1 ^2 v .. e y Pwj- Thus, the 
new desired locations are the same for all clouds Vj G V w , 
Pwj — Pw • Changing the locations of shared words might in- 
troduce new overlaps, so we run the Spiral Algorithm again 
to remove any overlaps. 

In principle, this process would be repeated until the final 
locations are the same as the desired ones, that is, when 
the Spiral Algorithm does not modify the given positions. 
At that point all shared words will be in precisely identical 
positions across the clouds. However, this process does not 
always converge, so in practice, we stop after a fixed number 
of iterations. 

However, in practice we find a serious problem with the 
iterative algorithm. The algorithm tends to move words far 
away from the center, because this makes it easier to place 
shared words in the same position across clouds. This results 
in sparse layouts with excessive white space that are visually 
unappealing. 



4.3 Coordinated Layout: Gradient Approach 

In this section, we present a new method to build a storm 
by solving an optimization problem. This will provide us 
with additional flexibility to incorporate aesthetic constraints 
into storm construction, because we can incorporate them 
as additional terms in the objective function. This will allow 
us to avoid the unsightly sparse layouts which are sometimes 
produced by the iterative algorithm. 

We call the objective function the Discrepancy Between 
Similarities (DBS). The DBS is a function of the set of clouds 
{vi, . . . ,vn} and the set of documents {m, . . . , un}, and 
measures how well the storm fits the document corpus. It 



f Ul ,...,u N (vi, . . . , v N ) = ^2 (d u (ui,Uj) - d v (vi,Vj)) 2 

l<i<j<N 

+ ^ c(Ui,Vi), 
l<i<N 

(1) 

where d u is a distance metric between documents and d v a 
metric between clouds. The DBS is to be minimized as a 
function of {vi}. The first summand, which we call stress, 
formalizes the idea that similar documents should have sim- 
ilar clouds and different documents, different clouds. The 
second summand uses a function that we call the correspon- 
dence function c(-, •), which should be chosen to ensure that 
each cloud v\ is a good representation of its document m. 

The stress part of the objective function is inspired by 
multidimensional scaling (MDS). MDS is a method for di- 
mensionality reduction of high-dimensional data pQ. Our 
use of the stress function is slightly different than is com- 
mon, because instead of projecting the documents onto a 
low-dimensional space, such as R 2 , we are mapping docu- 
ments to the space of word clouds. The space of word clouds 
is itself high-dimesionsal, and indeed, might have greater di- 
mension than the original space. Additionally, the space of 
word clouds is not Euclidean because of the non-overlapping 
constraints. 

For the metric d u among documents, we use Euclidean 
distance. For the dissimilarity function d v between clouds, 
we use 

dy{Vi^Vj) — ^ ^ (Siw Sjw) H - ^ ^ ^ (%iw %jw) ~~\~(yiw ', 
we\V wEWiDWj 

where n > is a parameter that determines the strength of 
each part. Note that the first summand considers all words 
in either cloud, and the second only the words that appear in 
both clouds. (If a word does not appear in a cloud, we treat 
its size as zero.) The intuition is that clouds are similar if 
their words have similar sizes and locations. Also note that, 
in contrast to the previous layout algorithm, by optimizing 
this function we will also determine the words' sizes. 

The difference between the objective functions for MDS 
and DBS is that the DBS adds the correspondence function 
c(ui, Vi). In MDS, the position of a data point in the target 
space is not interpretable on its own, but only relative to the 
other points. In contrast, in our case each word cloud must 
accurately represent its document. Ensuring this is the role 
of the correspondence function. In this work we use 

c(ui,Vi)= ^2 (u iw - s iw ) 2 , (2) 
weWi 

where recall that Ui W is the tf of word w. 

We also need to add additional terms to ensure that words 
do not overlap, and to favor compact configurations. We 
introduce these constraints as two penalty terms. When two 
words overlap, we add a penalty proportional to the square 
of the the minimum distance required to separate them; call 
this distance Oi. W)W t. We favor compactness by adding a 
penalty proportional to the the squared distance from each 
word towards the center; by convention we define the origin 
as the center, so this is simply the norm of word's position. 

Therefore, the final objective function that we use to lay 



out word storms in the gradient based method is 
gx(vi, ...,v N ) = f Ul ,...,u N (vi, . . .,v N )+ 

N N 

E oi iW , w ,+nj2 E hp-ii 2 ' ( 3 ) 

i=l w,w'£:Wi i=l w^Wi 

where A and \i are parameters that determine the strength 
of the overlap and compactness penalties, respectively. 

We optimze (|3| by solving a sequence of optimization 
problems for increasing values Ao < Ai < A2 < . . . of the 
overlap penalty. We increase A exponentially until no words 
overlap in the final solution. Each subproblem is minimized 
using gradient descent, initialized from the solution of the 
previous subproblem. 

4.4 Coordinated Layout: Combined Algorithm 

The iterative and gradient algorithms have complemen- 
tary strengths. The iterative algorithm is fast, but as it 
does not enforce the compactness of the clouds, the words 
drift away from the center. On the other hand, the gradi- 
ent method is able to create compact clouds, but it requires 
many iterations to converge and the layout strongly depends 
on the initialization. Therefore we combine the two meth- 
ods, using the final result of the iterative algorithm as the 
starting point for the gradient method. From this initializa- 
tion, the gradient method converges much faster, because it 
starts off without overlapping words. The gradient method 
tends to improve the initial layout significantly, because it 
pulls words closer to the center, creating a more compact 
layout. Also, the gradient method tends to pull together 
the locations of shared words for which the iterative method 
was not able to converge to a single position. 

5. EVALUATION 

The evaluation is divided in three parts: a qualitatively 
analysis, an automatic analysis and a user study. We use two 
different data sets. We used the scientific papers presented 
in the ICML 2012 conference, where we deployed a storm 
on the main conference Web site to compare the presented 
papers and help the people decide among session^] 

Second, we also use a data set provided by the Research 
Perspectives project^] [8], a project that aims to offer a visu- 
alization of the research portfolios of funding agencies. This 
data set contains the abstracts of the proposals for funded 
research grants from various funding agencies. We use a 
corpus of 2358 abstracts from the UK's Engineering and 
Physical Sciences Research Council (EPSRC). Each grant 
belongs to exactly one of the following programmes: Infor- 
mation and Communications Technology (626 grants), Phys- 
ical Sciences (533), Mathematical Sciences (331), Engineer- 
ing (317), User-Led Research (291) and Materials, Mechan- 
ical and Medical Engineering (264). Each of these top-level 
programmes as several subprogrammes that correspond to 
more specific research areas. 

5.1 Qualitative Analysis 

In this section, we discuss the presented storms qualita- 
tively, focusing on the additional information that is ap- 
parent from coordinated storms compared to independently 
built clouds. 



"http : //icml . cc/2012/whatson/ 



Also see http://www.researchperspectives.org 



First, we consider a storm that displays six research pro- 
grammes from EPSRC programmes, five of which are differ- 
ent subprogrammes of material sciences and the sixth one 
is the mathematical sciences programme. For this data set 
we present both a set of independent clouds (Figure [4]) and 
a storm generated by the combined algorithm (Figure [5]). 
From either set of clouds, we can get superficial idea of the 
corpus. We can see the most important words such as "mate- 
rials", which appears in the first five clouds, and some other 
words like "alloys", "polymer" and "mathematical". How- 
ever, it is hard to get more information than this from the 
independent clouds. 

On the other hand, by looking at the coordinated storm 
we can detect more properties of the corpus. First, it is in- 
stantly clear that the first five documents are similar and 
that the sixth one is the most different from all the oth- 
ers. This is because the storm reveals the shared structure 
in the documents, formed by shared words such as "materi- 
als", "properties" and "applications". Second, we can easily 
tell the presence or absence of words across clouds because 
of the consistent attributes and locations. For example, we 
can quickly see that "properties" does not appear in the sixth 
cloud or that "coatings" only occurs in two of the six. Fi- 
nally, the transparency of the words allows us to spot the 
informative terms quickly, such as "electron" (a), "metal" (b), 
"light" (c), "crack"(d), "composite"(e) and "problems" (f). All 
of these term are informative of the document content but 
are difficult to spot in the independent clouds of Figure [4] 
Overall, the coordinated storm seems to offer a more rich 
and comfortable representation that allows deeper analysis 
than the independently generated clouds. 

Similarly, from the ICML 2012 data set, Figure [I] shows 
a storm containing all the papers from a single conference 
session. It is immediately apparent from the clouds that the 
session discusses optimization algorithms. It is also clear 
that the papers (c) and (d) are very related since they share 
a lot of words such as "sgd", "stochastic" and "convex" which 
results in a similar layouts. The fact that shared words take 
similar positions can also force unique words into similar 
positions as well, which can make it easy to find terms that 
differentiate the clouds. For example, we can see how "herd- 
ing" (f), "coordinated" (g) and "similarity" (h) are in the 
same location or "semidefinite" (a), "quasi- newton" (b) and 
"nonsmooth" (d) are in the same location. 

Finally, Figures [2] and [3] show an example of a hierar- 
chical set of storms generated from the EPSRC grant ab- 
stracts. Figure [2] presents a storm created by grouping all 
abstracts by their top level scientific program. There we 
can see two pairs of similar programmes: Chemistry and 
Physical Sciences; and Engineering and Information Com- 
munication and Technology. In Figure [3] we show a second 
storm composed a six individual grants from the Complexity 
programme (Cloud (e) in Figures 2|. It is interesting to see 
how big words in the top level such as "complex", "systems", 
"network" and "models" appear with different weights in the 
grant level. In particular, the term "complex", that it is rare 
when looking at the top level, appears everywhere inside the 
complexity programme. Because of our use of transparency, 
this term is therefore prominent in the top level storm but 
less noticeable in the lower level storm. 

5.2 Automatic Evaluation 

Apart from evaluating the resulting storm qualitatively, 
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Figure 4: Independent Clouds visualizing six EPSRC Scientific Programmes. These programmes are also represented in 
Figure [5] 
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Figure 5: Coordinated storm visualizing six EPSRC Scientific Programmes. These programmes are also represented as 
independent clouds in Figure [4| Compared to that figure, it is much easier to see the differences between clouds. 



Time (s) Compactness (%) Accuracy (%) 

Lower Bound - - 26.5 

Independent Clouds 143.3 35.12 23.4 

Coordinated Storm (Iterative) 250.9 20.39 54.7 

Coordinated Storm (Combined) 2658.5 33.71 54.2 

Upper Bound - - 67.9 

Table 1: Comparison of the results given by different algorithms using the automatic evaluation. 



we propose a method to evaluate word storm algorithms au- 
tomatically. The objective is to assess how well the relations 
among documents are represented in the clouds. The moti- 
vation is similar in spirit to the celebrated BLEU measure 
in machine translation [5]: By evaluating layout algorithms 
with an automatic process rather than conducting a user 
study, the process can be faster and inexpensive, allowing 
rapid comparison of algorithms. 

Our automatic evaluation requires a corpus of labelled 
documents, e.g., with a class label that indicates their top- 
ics. The main idea is: If the visualization is faithful to the 
documents, then it should be possible to classify the docu- 
ments using the pixels in the visualization rather than the 
words in the documents. So we use classification accuracy 
as a proxy measure for visualization fidelity. 

In the context of word storms, the automatic evaluation 
consists of: (a) generating a storm from a labelled corpus 
with one cloud per cloud, (b) training a document classifier 
using the pixels of the clouds as attributes and (c) testing 
the classifier on a held out set to obtain the classification 
accuracy. More faithful visualizations are expected to have 
better classification accuracy. 

We use the Research Perspectives EPSRC data set with 
the research programme as class label. Thus, we have a 
single-label classification problem with 6 classes. The data 
was randomly split into a training and test set using an 
80/20 split. We use the word storm algorithms to create 
one cloud per abstract, so there are 2358 clouds in total. 
We compare three layout algorithms: (a) creating the clouds 
independently using the Spiral Algorithm, which is our base- 
line; (b) the iterative algorithm with 5 iterations and (c) the 
combined algorithm, using 5 iterations of the iterative algo- 
rithm to initialize the gradient method. 

We represent each cloud by a vector of the RGB values 
of its pixels. To reduce the size of this representation, we 
perform feature selection, discarding features with zero in- 
formation gain. We classify the clouds by using support 
vector machines with normalized polynomial kerneQ 

In order to put the classification accuracy into context, we 
present a lower bound obtain if all instances are classified 
as the largest class (ICT), which produces an accuracy of 
26.5%. To obtain an upper bound, we classifying the doc- 
uments directly using bag-of- words features from the text, 
which should perform better than transforming the text into 
a visualization. Using a support vector machine, this yields 
an accuracy of 67.9%. 

Apart from the classification accuracy, we also report the 
running time of the layout algorithm (in seconds J] and, as 
a simple aesthetic measure, the compactness of the word 



4 The classification is performed by using the SMO imple- 
mentation of Weka 

5 All experiments were run on a 3.1 GHz Intel Core i5 server 
with 8GB of RAM. 



clouds. We compute the compactness by taking the mini- 
mum bounding box of the cloud and calculating the percent- 
age of non-background pixels. We use this measure because 
informally we noticed that more compact clouds tend to be 
more visually appealing. 

The results are shown in Table [I] Creating the clouds 
independently is faster than any coordinated algorithm and 
also produces very compact clouds. However, for classifica- 
tion, this method is no better than random. The algorithms 
to create coordinated clouds, the iterative and the combined 
algorithm, achieve a 54% classification accuracy, which is 
significantly higher than the lower bound. This confirms 
the intuition that by coordinating the clouds, the relations 
among documents are better represented. 

The differences between the coordinated methods can be 
seen in the running time and in the compactness. Although 
the iterative algorithm achieves much better classification 
accuracy than the baseline, this is at the cost of producing 
much less compact clouds. The combined algorithm, on the 
other hand, is able to match the compactness of indepen- 
dently built clouds (33.71% combined and 35.12% indepen- 
dent) and the classification accuracy of the iterative algo- 
rithm. The combined algorithm is significantly more expen- 
sive in computation time, although it should be noted that 
even the combined algorithm uses only 1.1s for each of the 
2358 clouds in the storm. Therefore, although the combined 
algorithm requires more time, it seems the best option, be- 
cause the resulting storm offers good classification accuracy 
without losing compactness. 

A potential pitfall with automatic evaluations is that it 
is possible for algorithms to game the system, producing 
visualizations that score better but look worse. This has 
arguably happened in machine translation, in which BLEU 
has been implicitly optimized, and possibly overfit, by the 
research community for many years. For this reason it is 
important to combine Furthermore, in our case, none of our 
the algorithms optimize the classification accuracy directly 
but instead follow very different considerations. But the 
concern of "research community overfitting" is one to take 
seriously if automated evaluation of visualization is more 
widely adopted. 

5.3 User Study 

In order to confirm our results using the automatic evalu- 
ation, we conducted a pilot user study comparing the stan- 
dard independent word clouds with coordinated storms cre- 
ated by the combined algorithm. The study consisted of 5 
multiple choice questions. In each of them, the users were 
presented with six clouds and were asked to perform a sim- 
ple task. The tasks were of two kinds: checking the pres- 
ence of words and comparing documents. The clouds for 
each question were generated either as independent clouds 
or a coordinated storm. In every question, the user received 



Independent clouds Coordinated Storm 



Select clouds with word 
"technology" 


JT I tJClblOll 1 / J 

Recall (%) 
Time (s) 


QD 

65 
51 ± 23 


1 on 

85 
36 ± 10 


Select clouds without word 
"energy" 


rlUUlolUil \/0) 

Recall (%) 
Time (s) 


QD 

85 
56 ± 18 


yo 
95 
40 ± 14 


Select clouds with words "models", 
"network" and "system" 


Precision (%) 
Recall (%) 
Time (s) 


75 
90 
87 ± 35 


90 
100 
124 ± 46 


Select the most different cloud 


Accuracy (%) 
Time (s) 


30 
36 ± 12 


90 
23 ± 10 


Select the most similar pair clouds 


Accuracy (%) 
Time (s) 


10 

54 ± 23 


70 

75 db 19 



Table 2: Results of the user study. Accuracy on the last two questions, which required comparing documents, is much higher 
for the users that were presented with a coordinated storm. 



one of the two versions randoml}|_| Although users were 
told in the beginning that word clouds had been built us- 
ing different methods, the number of different methods was 
not revealed, the characteristics of the methods were not 
explained and they did not know which method was used 
for each question. Moreover, in order to reduce the effect of 
possible bias factors, the tasks were presented in a random 
order and the 6 clouds in each question were also sorted ran- 
domly. The study was taken by 20 people, so each question 
was answered 10 times using the independent clouds and 10 
times using a coordinated storm. 

Table [2] presents the results of the study. The first three 
questions asked the users to select the clouds that contained 
or lacked certain words. The results show that although the 
precision and recall are high in both cases and the differ- 
ences are small, the coordinated storm always has a higher 
score than the independent clouds. This might be because 
the structured layout helped the users to find words, even 
though the users did not know how the storms were laid out. 

The last two questions asked the users to compare the 
documents and to select "the cloud that is most different 
from all the others" and "the most similar pair of clouds". In 
the first case, two clouds had a cosine similar it}Q lower than 
0.3 with all the others, while all others pair had a similarity 
higher than 0.5. In the last question, the most similar pair of 
clouds had a cosine similarity of 0.71, while the score of the 
second most similar was 0.48. As these questions only have 
a correct answer, the measure used is the accuracy, instead 
of the precision and recall. 

The results for the last two questions show that the coor- 
dinated storm outperforms the independent clouds. While 
a 90% and a 70% of the users presented with the coordi- 
nated version answered correctly, only a 30% and a 10% 
did so with the independent version. This confirms that 
coordinated storms allow the users to contrast the clouds 
and understand their relations, while independent clouds are 
misleading in these tasks. 

6 The random process ensured that we would have the same 
number of answers for each method 

7 The documents were taken using the bag of words represen- 
tation with frequencies. The cosine similarity was computed 
twice: considering all words and only considering the top 25 
words included in the cloud. 



Although the sample size is small, results favour the co- 
ordinated storm. In particular, when the users are asked 
to compare clouds, the differences in user accuracy are ex- 
tremely large. Regarding the answering time, the differences 
between the two conditions are not significant. 

6. RELATED WORK 

Word clouds were inspired by tag clouds, which first ap- 
peared as an attractive way to summarize and browse a 
user-specified folksonomy. Originally, the tags were orga- 
nized in horizontal lines and sorted by alphabetical order, a 
layout that is still used in many websites such as Flickr and 
Delicious. Word clouds extend this idea to document visu- 
alization. Of the many word cloud generators, one of the 
most popular is Wordle [6j [13] , which produces particularly 
appealing layouts. 

However, in contrast to visualizing a single document, the 
topic of visualizing corpora has received much less attention. 
Several research has proposed to create the clouds by using 
different importance measures, such as the tf * idf [7] or the 
relative frequency when only the relations of a single docu- 
ment have to be analysed [I0j|3]. Nevertheless, without a 
different layout algorithm clouds are still difficult to com- 
pare because they do not attempt to follow the coordination 
of similarity principle and shared words are hard to find. 

Collins et al. [4] presented Parallel Tag Clouds, a method 
that aims to make comparisons easier by representing the 
documents as lists. The closest related work was presented 
by Cui et al. [5], which was later improved by Wu et al. [14] . 
This work proposes using a sequence of word clouds along 
with a trend chart to show the evolution of a corpus over 
time. They present a new layout algorithm with the goal 
of keeping semantically similar words close to each other in 
each cloud. This goal is very different from that of our work: 
Preserving semantic relations between words within a cloud 
is different than coordinating similarities across clouds, and 
does not necessarily result in similar documents being rep- 
resented by similar clouds. 

7. CONCLUSIONS 

We have introduced the concept of word storms, which 
is a group of word clouds designed for the visualization of 



a corpus of documents. We presented a series of princi- 
ples for effective storms, arguing that the clouds in a storm 
should be built in a coordinated fashion to facilitate com- 
parison. We presented a novel algorithm that builds storms 
in a coordinated fashion, placing shared words in a simi- 
lar location across clouds, so that similar documents will 
have similar storms. Using both an automatic evaluation 
and a user study, we showed that coordinated storms were 
markedly superior to independent word clouds for the pur- 
poses of comparing and contrasting documents. 
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